From free shallow monolingual resources to machine translation systems

نویسندگان

  • Helena M. Caseli
  • Maria das Graças V. Nunes
  • Mikel L. Forcada
چکیده

The availability of machine-readable bilingual linguistic resources is crucial not only for machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources demands extensive manual work. This paper describes a methodology to build automatically bilingual dictionaries and transfer rules by extracting knowledge from word-aligned parallel corpora processed with free shallow monolingual resources (morphological analysers and part-of-speech taggers). Experiments for Brazilian Portuguese– Spanish and Brazilian Portuguese– English parallel texts have shown promising results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From free shallow monolingual resources to machine translation systems: easing the task

The availability of machine-readable bilingual linguistic resources is crucial not only for machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources demands extensive manual work. This paper describes a methodology to build automatically bilingual dictionaries and transfer rules by extracting knowledge from word-ali...

متن کامل

Sharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium

In this paper, we describe two methods developed for sharing linguistic data between two free and open source rule based machine translation systems: Apertium, a shallow-transfer system; and Grammatical Framework (GF), which performs a deeper syntactic transfer. In the first method, we describe the conversion of lexical data from Apertium to GF, while in the second one we automatically extract ...

متن کامل

Rapid development of RBMT systems for related languages

The article describes a new way of constructing rule-based machine translation systems (RBMT). RBMT systems are currently among the best performing machine translation systems. Most of the "big named" machine translation systems (Systran, 2007)(Promt, 2007) belong to this category, but these systems have a big drawback; construction of such systems demands a great amount of time and resources, ...

متن کامل

Expanding Parallel Resources for Medium-Density Languages for Free

We discuss a previously proposed method for augmenting parallel corpora of limited size for the purposes of machine translation through monolingual paraphrasing of the source language. We develop a three-stage shallow paraphrasing procedure to be applied to the Swedish-Bulgarian language pair for which limited parallel resources exist. The source language exhibits specifics not typical of high-...

متن کامل

Statistical Machine Translation without Parallel Data

We examine approaches of statistical machine translation without parallel data (SMT). SMT has achieved impressive performance by leveraging large amounts of parallel data in the source and target languages. But such data is available only for a few language pairs and domains. Using human annotation to create new parallel corpora sufficient for building a good translation system is too expensive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008